Systems of Nonlinear Equations
Problem. Let us find a solution
and
We shall describe the Newton's method and three quasi-Newton methods:
Broyden's method,
Davidon-Fletcher-Powell method, and
Broyden-Fletcher-Goldfarb-Schano method.
Given starting approximation
Remark. Description of the methods and examples are taken from the textbook Numerička matematika, poglavlje 4.4.
Newton's method
Jacobian or Jacobi matrix of functions
Given starting approximation
where
Jacobians are computed using the package ForwardDiff.jl. Plotting is done using the package Plots.jl.
xxxxxxxxxxbegin using ForwardDiff using Plots plotly() using LinearAlgebraendNewton (generic function with 2 methods)xxxxxxxxxxfunction Newton(f::Function,J::Function,x::Vector{T},ϵ::Float64=1e-10) where T iter=0 s=ones(T,size(x)) ξ=x while norm(s)>ϵ && iter<100 s=J(x)\f(x) ξ=x-s iter+=1 x=ξ end ξ,iterendExample 1
(Dennis and Schnabel, 1996) The solutions of the system
are the points
f₁ (generic function with 1 method)xxxxxxxxxx# Vector functionf₁(x)=[2(x[1]+x[2])^2+(x[1]-x[2])^2-8,5*x[1]^2+(x[2]-3)^2-9]11.0
-3.0
f₁((1.0,2))Let us plot the functions and the contours in order to approximately locate the zeros:
x
begin # Number of points m=101 X=range(-2,stop=3,length=m) Y=range(-2,stop=3,length=m) # First applicate surface(X,Y,(x,y)->f₁([x,y])[1],xlabel="x",ylabel="y",colorbar=false) # Second applicate surface!(X,Y,(x,y)->f₁([x,y])[2],seriescolor=:blues)endxxxxxxxxxxbegin # Locate the solutions using contour plot contour(X,Y,(x,y)->f₁([x,y])[1],contour_labels=true) contour!(X,Y,(x,y)->f₁([x,y])[2],contour_labels=true)endxxxxxxxxxx# Clearer picturecontour!(clims=(0,0.01),xlabel="x",ylabel="y",colorbar=:none)We see that the approximate zeros are
J₁ (generic function with 1 method)xxxxxxxxxxJ₁(x)=ForwardDiff.jacobian(f₁,x)2×2 Array{Float64,2}:
10.0 14.0
10.0 -2.0xxxxxxxxxx# For exampleJ₁([1.0,2])-1.18347
1.58684
8
1.0
1.0
6
1.0
1.0
1
NaN
NaN
2
xxxxxxxxxxNewton(f₁,J₁,[-1.0,0.0]), Newton(f₁,J₁,[0.5,1.1]), Newton(f₁,J₁,[1.0,1.0]), Newton(f₁,J₁,[0.0,0.0])Example 2
(Dennis and Schnabel, 1996) The solutions of the system
are the points
xxxxxxxxxxbegin f₂(x)=[x[1]^2+x[2]^2-2,exp(x[1]-1)+x[2]^3-2] contour(X,Y,(x,y)->f₁([x,y])[1],contour_labels=true) contour!(X,Y,(x,y)->f₁([x,y])[2],contour_labels=true) contour!(clims=(0,0.01),xlabel="x",ylabel="y",colorbar=:none)end-0.713747
1.22089
5
1.0
1.0
5
xxxxxxxxxxbegin J₂(x)=ForwardDiff.jacobian(f₂,x) Newton(f₂,J₂,[-1.0,1]), Newton(f₂,J₂,[0.8,1.2])endExample 3
(Dennis and Schnabel, 1996) Solve
The exact solutions are
J₃ (generic function with 1 method)begin f₃(x)=[x[1],x[2]^2+x[2],exp(x[3])-1] J₃(x)=ForwardDiff.jacobian(f₃,x)end0.0
0.0
0.0
7
0.0
0.0
7.78375e-17
7
0.0
0.0666667
NaN
2
0.0
-1.0
0.0
6
xxxxxxxxxxNewton(f₃,J₃,[-1.0,1.0,0.0]),Newton(f₃,J₃,[1.0,1,1]),Newton(f₃,J₃,[-1.0,1,-10]),Newton(f₃,J₃,[0.5,-1.5,0])Example 4
(Rosenbrock parabolic valley) Given is the function
We want to find possible extremal points, thet is, we want to solve the equation
f₄ (generic function with 1 method)f₄(x)=100(x[2]-x[1]^2)^2+(1-x[1])^2xxxxxxxxxx# Plot the function using X and Y from Example 1surface(X,Y,(x,y)->f₄([x,y]), seriescolor=:blues, xlabel="x", ylabel="y")xxxxxxxxxxbegin # This function is very demanding w.r.t. finding extremal points g₄(x)=ForwardDiff.gradient(f₄,collect(x)) contour(X,Y,(x,y)->g₄([x,y])[1],contour_labels=true) contour!(X,Y,(x,y)->g₄([x,y])[2],contour_labels=true) contour!(clims=(-0.5,0.5),xlabel="x",ylabel="y",colorbar=:none)endBy observing contours, we conclude that this example is numerically demanding, while it is easy to see analytically that the only zero is
Here the vector function is given as the gradient of the scalar function FowardDiff.hessian() which approximates the matrix of second partial derivatives of the scalar function
1.0
1.0
7
xxxxxxxxxxNewton(g₄,x->ForwardDiff.hessian(f₄,x),[-1.0,2.0])Example 5
Let
in Examples 1, 2 and 3 and
in Example 4, here
so the method is inaccurate and does not converge towrds the exact solution
f₅ (generic function with 1 method)xxxxxxxxxxbegin t=collect(0:10) y=[0.001,0.01,0.04,0.12,0.21,0.25,0.21,0.12,0.04,0.01,0.001] f₅(x)=sum([( x[3]*exp(-((t[i]-x[1])^2/x[2]))-y[1])^2 for i=1:11])end0.157753
2.71553e-6
0.029986
1.13247
1.73704e5
xxxxxxxxxxbegin # Starting point is very near the solution x₀=[4.9,2.63,0.28] f₅(x₀) g₅(x)=ForwardDiff.gradient(f₅,x) J₅(x)=ForwardDiff.hessian(f₅,x) f₅(x₀), g₅(x₀), cond(J₅(x₀))end6.50244
0.00245085
-1.60888e-7
100
xxxxxxxxxxx₅,iter₅=Newton(g₅,J₅,x₀,1e-8)1.52356e-51
2.04301e-49
-3.07371e-47
xxxxxxxxxxg₅(x₅)NaN
NaN
NaN
13
xxxxxxxxxxNewton(g₅,J₅,[4.9,2.62,0.28],1e-8)Broyden's method
Given starting approximation
In this manner, we avoid the computation of the Jacobi matrix in each step. We can take
Broyden (generic function with 2 methods)function Broyden(f::Function,B::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T iter=0 s=ones(T,length(x)) ξ=x while norm(s)>ϵ && iter<100 s=-(B\f(x)) ξ=x+s y=f(ξ)-f(x) B=B+(y-B*s)*(s/(s⋅s))' x=ξ iter+=1 end ξ,iterend-1.18347
1.58684
12
1.0
1.0
7
xxxxxxxxxx# Example 1Broyden(f₁,J₁([-1.0,0.0]),[-1.0,0.0]), Broyden(f₁,J₁([1.0,1.5]),[1.0,1.5])-0.0356391
-2.53166
100
0.916022
-3.04547
100
-1.18347
1.58684
14
xxxxxxxxxxbegin # Explain behaviour the method when we set B₀=I eye(n)=Matrix{Float64}(I,n,n) Broyden(f₁,eye(2),[-1.0,0.0]), Broyden(f₁,eye(2),[1.0,1.5]), Broyden(f₁,eye(2),[-1,1.5])end-0.713747
1.22089
9
1.0
1.0
9
xxxxxxxxxxbegin # Example 2 x0=[-1.0,1] x1=[0.8,1.2] Broyden(f₂,J₂([-1.0,1]),[-1.0,1]), Broyden(f₂,J₂([0.8,1.2]),[0.8,1.2])end0.0
5.96536e-26
0.0
9
0.0
-1.0
0.0
8
xxxxxxxxxx# Example 3Broyden(f₃,J₃([-1.0,1,0]),[-1.0,1,0]), Broyden(f₃,J₃([0.5,-1.5,0]),[0.5,-1.5,0])0.834064
0.695042
100
1.0
1.0
4
1.0
1.0
29
xxxxxxxxxx# Example 4Broyden(g₄,(x->ForwardDiff.hessian(f₄,x))([-1.0,2]),[-1.0,2]), # aliBroyden(g₄,(x->ForwardDiff.hessian(f₄,x))([1,2.0]),[-1.0,2]),Broyden(g₄,(x->ForwardDiff.hessian(f₄,x))([0.8,0.5]),[0.8,0.5])18.9003
1.32399
0.0665517
6
xxxxxxxxxxbegin # Example 5 x₆,iter₆=Broyden(g₅,(x->ForwardDiff.hessian(f₅,x))([4.9,2.6,0.2]), [4.9,2.6,0.2])end1.85527e-29
-6.2359e-29
-2.07346e-29
xxxxxxxxxxg₅(x₆)Davidon-Fletcher-Powell (DFP) method
DFP is an optimization method which finds extremal points of the function
Given initial approximation
We can take
The one-dimensional minimzation along the line
Bisection (generic function with 2 methods)xxxxxxxxxxfunction Bisection(f::Function,a::T,b::T,ϵ::Float64=1e-10) where T fa=f(a) fb=f(b) x=T fx=T if fa*fb>zero(T) # return "Incorrect interval" if abs(fa)>abs(fb) return b,fb,0 else return a,fa,0 end end iter=0 while b-a>ϵ && iter<1000 x=(b+a)/2.0 fx=f(x) if fa*fx<zero(T) b=x fb=fx else a=x fa=fx end iter+=1 # @show x,fx end x,fx,iterendDFP (generic function with 2 methods)xxxxxxxxxxfunction DFP(f::Function,H::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T iter=0 s=ones(T,length(x)) ξ=x while norm(s)>ϵ && iter<50 s=-H*f(x) s0=s/norm(s) F(ζ)=f(x+ζ*s)⋅s0 β,fx,iterb=Bisection(F,0.0,1.0,10*eps()) s*=β ξ=x+s y=f(ξ)-f(x) z=H*y H=H+(s/(y⋅s))*s'-(z/(y⋅z))*z' x=ξ iter+=1 end ξ,iterendExample 6
Let us find extremal points of the function
The function has minimum at
f₆ (generic function with 1 method)xxxxxxxxxxf₆(x) = (x[1] + 2*x[2]-7)^2 + (2*x[1] + x[2]-5)^25xxxxxxxxxxf₆([1,2])g₆ (generic function with 1 method)xxxxxxxxxxg₆(x)=ForwardDiff.gradient(f₆,x)1.0
3.0
4
xxxxxxxxxxDFP(g₆,eye(2),[0.8,2.7],eps())1.0
1.0
9
xxxxxxxxxx# Example 4DFP(g₄,eye(2),[0.9,1.1])4.89611
46.1979
0.00158913
25
xxxxxxxxxx# Example 5DFP(g₅,eye(3),[4.9,2.6,0.2])Broyden-Fletcher-Goldfarb-Schano (BFGS) method
BFGS is an optimization method for finding extremal points of the function
The method is similar to the DFP method, with somewhat better convergence properties.
Let
Given initial approximation
We can take
The one-dimensional minimzation along the line
BFGS (generic function with 2 methods)xxxxxxxxxxfunction BFGS(f::Function,H::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T iter=0 s=ones(T,length(x)) ξ=x while norm(s)>ϵ && iter<50 s=-H*f(x) s0=s/norm(s) F(ζ)=f(x+ζ*s)⋅s0 β,fx,iterb=Bisection(F,0.0,1.0,10*eps()) s*=β ξ=x+s y=f(ξ)-f(x) z=H*y α=y⋅s s1=s/α H=H-s1*z'-z*s1'+s1*(y⋅z)*s1'+s1*s' x=ξ iter+=1 end ξ,iterend1.0
3.0
4
xxxxxxxxxxBFGS(g₆,eye(2),[0.8,2.7],eps())1.0
1.0
9
xxxxxxxxxx# Example 4BFGS(g₄,eye(2),[0.9,1.1])2.08908
31226.6
0.00100056
50
xxxxxxxxxx# Example 5BFGS(g₅,eye(3),[4.9,2.6,0.2])Julia packages
Previous programs are simple illustrative implementations of the mentioned algorithms. Julia package NLsolve.jl is used for solving systems of linear equations and the package Optim.jl is used for nonlinear optimization.
xxxxxxxxxxusing NLsolvef₁! (generic function with 1 method)xxxxxxxxxx# Example 1function f₁!(fvec,x) fvec[1] = 2(x[1]+x[2])^2+(x[1]-x[2])^2-8 fvec[2] = 5*x[1]^2+(x[2]-3)^2-9endResults of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [-1.0, 0.0]
* Zero: [-1.1834670032425283, 1.5868371427230779]
* Inf-norm of residuals: 0.000000
* Iterations: 5
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 6
* Jacobian Calls (df/dx): 6xxxxxxxxxxnlsolve(f₁!,[-1.0,0])Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [0.5, 1.1]
* Zero: [1.0000000000000002, 1.0000000000000002]
* Inf-norm of residuals: 0.000000
* Iterations: 5
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 6
* Jacobian Calls (df/dx): 6xxxxxxxxxxnlsolve(f₁!,[0.5,1.1])Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [-1.0, 1.0] * Zero: [-0.7137474114758742, 1.2208868222037403] * Inf-norm of residuals: 0.000000 * Iterations: 4 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 5 * Jacobian Calls (df/dx): 5
Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [0.8, 1.2] * Zero: [0.9999999999940328, 1.0000000000115203] * Inf-norm of residuals: 0.000000 * Iterations: 4 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 5 * Jacobian Calls (df/dx): 5
xxxxxxxxxxbegin # Example 2 function f₂!(fvec,x) fvec[1] = x[1]^2+x[2]^2-2 fvec[2] = exp(x[1]-1)+x[2]^3-2 end nlsolve(f₂!,[-1.0,1]), nlsolve(f₂!,[0.8,1.2])endResults of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [-1.0, 1.0, 0.0] * Zero: [0.0, 2.3283064364709337e-10, 0.0] * Inf-norm of residuals: 0.000000 * Iterations: 5 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 6 * Jacobian Calls (df/dx): 6
Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [1.0, 1.0, 1.0] * Zero: [0.0, 2.3283064364709337e-10, 1.223235687436471e-12] * Inf-norm of residuals: 0.000000 * Iterations: 5 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 6 * Jacobian Calls (df/dx): 6
Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [-1.0, 1.0, -10.0] * Zero: [0.0, 4.9978109751571114e-11, 8.131707812509303e-17] * Inf-norm of residuals: 0.000000 * Iterations: 20 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 21 * Jacobian Calls (df/dx): 8
Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [0.5, -1.5, 0.0] * Zero: [0.0, -1.0000000000000007, 0.0] * Inf-norm of residuals: 0.000000 * Iterations: 5 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 6 * Jacobian Calls (df/dx): 6
xxxxxxxxxxbegin # Example 3 function f₃!(fvec,x) fvec[1] = x[1] fvec[2] = x[2]^2+x[2] fvec[3] = exp(x[3])-1 end nlsolve(f₃!,[-1.0,1.0,0.0]), nlsolve(f₃!,[1.0,1,1]), nlsolve(f₃!,[-1.0,1,-10]), nlsolve(f₃!,[0.5,-1.5,0])endxxxxxxxxxxusing Optim * Status: success
* Candidate solution
Final objective value: 5.375030e-17
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 5.13e-09 ≰ 0.0e+00
|x - x'|/|x'| = 5.13e-09 ≰ 0.0e+00
|f(x) - f(x')| = 9.67e-17 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 1.80e+00 ≰ 0.0e+00
|g(x)| = 2.10e-11 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 35
f(x) calls: 102
∇f(x) calls: 102
xxxxxxxxxx# Example 4optimize(f₄,[-1.0,2],Optim.BFGS()) * Status: success
* Candidate solution
Final objective value: 3.572548e-12
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 1.87e+05 ≰ 0.0e+00
|x - x'|/|x'| = 8.70e-02 ≰ 0.0e+00
|f(x) - f(x')| = 6.03e-16 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 1.69e-04 ≰ 0.0e+00
|g(x)| = 9.69e-09 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 41
f(x) calls: 137
∇f(x) calls: 137
xxxxxxxxxx# Example 5 - again there is no convergenceoptimize(f₅,[4.9,2.6,0.2],Optim.BFGS())Enter cell code...
xxxxxxxxxx